Overview

Dataset statistics

Number of variables13
Number of observations5960
Missing cells5271
Missing cells (%)6.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 MiB
Average record size in memory266.3 B

Variable types

Categorical3
Numeric10

Warnings

MORTDUE has 518 (8.7%) missing values Missing
VALUE has 112 (1.9%) missing values Missing
REASON has 252 (4.2%) missing values Missing
JOB has 279 (4.7%) missing values Missing
YOJ has 515 (8.6%) missing values Missing
DEROG has 708 (11.9%) missing values Missing
DELINQ has 580 (9.7%) missing values Missing
CLAGE has 308 (5.2%) missing values Missing
NINQ has 510 (8.6%) missing values Missing
CLNO has 222 (3.7%) missing values Missing
DEBTINC has 1267 (21.3%) missing values Missing
YOJ has 415 (7.0%) zeros Zeros
DEROG has 4527 (76.0%) zeros Zeros
DELINQ has 4179 (70.1%) zeros Zeros
NINQ has 2531 (42.5%) zeros Zeros
CLNO has 62 (1.0%) zeros Zeros

Reproduction

Analysis started2021-01-06 03:28:10.433541
Analysis finished2021-01-06 03:28:36.154216
Duration25.72 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

BAD
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size337.7 KiB
0
4771 
1
1189 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters5960
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0
ValueCountFrequency (%)
04771
80.1%
11189
 
19.9%
Histogram of lengths of the category
ValueCountFrequency (%)
04771
80.1%
11189
 
19.9%

Most occurring characters

ValueCountFrequency (%)
04771
80.1%
11189
 
19.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5960
100.0%

Most frequent character per category

ValueCountFrequency (%)
04771
80.1%
11189
 
19.9%

Most occurring scripts

ValueCountFrequency (%)
Common5960
100.0%

Most frequent character per script

ValueCountFrequency (%)
04771
80.1%
11189
 
19.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII5960
100.0%

Most frequent character per block

ValueCountFrequency (%)
04771
80.1%
11189
 
19.9%

LOAN
Real number (ℝ≥0)

Distinct540
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18607.9698
Minimum1100
Maximum89900
Zeros0
Zeros (%)0.0%
Memory size46.7 KiB

Quantile statistics

Minimum1100
5-th percentile5900
Q111100
median16300
Q323300
95-th percentile40000
Maximum89900
Range88800
Interquartile range (IQR)12200

Descriptive statistics

Standard deviation11207.48042
Coefficient of variation (CV)0.6022946371
Kurtosis6.932589768
Mean18607.9698
Median Absolute Deviation (MAD)6000
Skewness2.023780712
Sum110903500
Variance125607617.3
MonotocityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15000105
 
1.8%
1000081
 
1.4%
2000074
 
1.2%
2500073
 
1.2%
1200069
 
1.2%
1700051
 
0.9%
1300050
 
0.8%
500050
 
0.8%
1100047
 
0.8%
800044
 
0.7%
Other values (530)5316
89.2%
ValueCountFrequency (%)
11001
 
< 0.1%
13001
 
< 0.1%
15002
 
< 0.1%
17002
 
< 0.1%
18002
 
< 0.1%
20006
0.1%
21001
 
< 0.1%
22003
0.1%
23003
0.1%
24006
0.1%
ValueCountFrequency (%)
899001
< 0.1%
898001
< 0.1%
892001
< 0.1%
890001
< 0.1%
889002
< 0.1%
888001
< 0.1%
885001
< 0.1%
883001
< 0.1%
875001
< 0.1%
870001
< 0.1%

MORTDUE
Real number (ℝ≥0)

MISSING

Distinct5053
Distinct (%)92.9%
Missing518
Missing (%)8.7%
Infinite0
Infinite (%)0.0%
Mean73760.8172
Minimum2063
Maximum399550
Zeros0
Zeros (%)0.0%
Memory size46.7 KiB

Quantile statistics

Minimum2063
5-th percentile18232.4
Q146276
median65019
Q391488
95-th percentile151999.55
Maximum399550
Range397487
Interquartile range (IQR)45212

Descriptive statistics

Standard deviation44457.60946
Coefficient of variation (CV)0.6027266392
Kurtosis6.481866314
Mean73760.8172
Median Absolute Deviation (MAD)21655.5
Skewness1.814480702
Sum401406367.2
Variance1976479039
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4200011
 
0.2%
4700010
 
0.2%
650009
 
0.2%
450007
 
0.1%
500007
 
0.1%
1240007
 
0.1%
700007
 
0.1%
620007
 
0.1%
550007
 
0.1%
580006
 
0.1%
Other values (5043)5364
90.0%
(Missing)518
 
8.7%
ValueCountFrequency (%)
20631
< 0.1%
26191
< 0.1%
28001
< 0.1%
33721
< 0.1%
40001
< 0.1%
44471
< 0.1%
45001
< 0.1%
46411
< 0.1%
47341
< 0.1%
47421
< 0.1%
ValueCountFrequency (%)
3995501
< 0.1%
3994121
< 0.1%
3972991
< 0.1%
3910001
< 0.1%
3710031
< 0.1%
3698741
< 0.1%
3679171
< 0.1%
3670891
< 0.1%
3655281
< 0.1%
3637371
< 0.1%

VALUE
Real number (ℝ≥0)

MISSING

Distinct5381
Distinct (%)92.0%
Missing112
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean101776.0487
Minimum8000
Maximum855909
Zeros0
Zeros (%)0.0%
Memory size46.7 KiB

Quantile statistics

Minimum8000
5-th percentile39050.7
Q166075.5
median89235.5
Q3119824.25
95-th percentile203717.2
Maximum855909
Range847909
Interquartile range (IQR)53748.75

Descriptive statistics

Standard deviation57385.77533
Coefficient of variation (CV)0.5638436159
Kurtosis24.36280488
Mean101776.0487
Median Absolute Deviation (MAD)25764.5
Skewness3.053344267
Sum595186333
Variance3293127211
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6000015
 
0.3%
8000014
 
0.2%
8500012
 
0.2%
6500011
 
0.2%
7800010
 
0.2%
720009
 
0.2%
680008
 
0.1%
870008
 
0.1%
500008
 
0.1%
1050007
 
0.1%
Other values (5371)5746
96.4%
(Missing)112
 
1.9%
ValueCountFrequency (%)
80001
< 0.1%
88001
< 0.1%
91001
< 0.1%
95001
< 0.1%
115501
< 0.1%
117021
< 0.1%
124141
< 0.1%
125001
< 0.1%
127371
< 0.1%
129721
< 0.1%
ValueCountFrequency (%)
8559091
< 0.1%
8541141
< 0.1%
8541121
< 0.1%
8500001
< 0.1%
5126501
< 0.1%
5111641
< 0.1%
5050001
< 0.1%
4718271
< 0.1%
4697711
< 0.1%
4697481
< 0.1%

REASON
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing252
Missing (%)4.2%
Memory size543.1 KiB
DebtCon
3928 
HomeImp
1780 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters39956
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHomeImp
2nd rowHomeImp
3rd rowHomeImp
4th rowHomeImp
5th rowHomeImp
ValueCountFrequency (%)
DebtCon3928
65.9%
HomeImp1780
29.9%
(Missing)252
 
4.2%
Histogram of lengths of the category
ValueCountFrequency (%)
debtcon3928
68.8%
homeimp1780
31.2%

Most occurring characters

ValueCountFrequency (%)
o5708
14.3%
e5708
14.3%
D3928
9.8%
b3928
9.8%
t3928
9.8%
C3928
9.8%
n3928
9.8%
m3560
8.9%
H1780
 
4.5%
I1780
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter28540
71.4%
Uppercase Letter11416
 
28.6%

Most frequent character per category

ValueCountFrequency (%)
o5708
20.0%
e5708
20.0%
b3928
13.8%
t3928
13.8%
n3928
13.8%
m3560
12.5%
p1780
 
6.2%
ValueCountFrequency (%)
D3928
34.4%
C3928
34.4%
H1780
15.6%
I1780
15.6%

Most occurring scripts

ValueCountFrequency (%)
Latin39956
100.0%

Most frequent character per script

ValueCountFrequency (%)
o5708
14.3%
e5708
14.3%
D3928
9.8%
b3928
9.8%
t3928
9.8%
C3928
9.8%
n3928
9.8%
m3560
8.9%
H1780
 
4.5%
I1780
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII39956
100.0%

Most frequent character per block

ValueCountFrequency (%)
o5708
14.3%
e5708
14.3%
D3928
9.8%
b3928
9.8%
t3928
9.8%
C3928
9.8%
n3928
9.8%
m3560
8.9%
H1780
 
4.5%
I1780
 
4.5%

JOB
Categorical

MISSING

Distinct6
Distinct (%)0.1%
Missing279
Missing (%)4.7%
Memory size494.6 KiB
Other
2388 
ProfExe
1276 
Office
948 
Mgr
767 
Self
 
193

Length

Max length7
Median length5
Mean length5.312092941
Min length3

Characters and Unicode

Total characters30178
Distinct characters18
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOther
2nd rowOther
3rd rowOther
4th rowOffice
5th rowOther
ValueCountFrequency (%)
Other2388
40.1%
ProfExe1276
21.4%
Office948
 
15.9%
Mgr767
 
12.9%
Self193
 
3.2%
Sales109
 
1.8%
(Missing)279
 
4.7%
Histogram of lengths of the category
ValueCountFrequency (%)
other2388
42.0%
profexe1276
22.5%
office948
 
16.7%
mgr767
 
13.5%
self193
 
3.4%
sales109
 
1.9%

Most occurring characters

ValueCountFrequency (%)
e4914
16.3%
r4431
14.7%
f3365
11.2%
O3336
11.1%
t2388
7.9%
h2388
7.9%
P1276
 
4.2%
o1276
 
4.2%
E1276
 
4.2%
x1276
 
4.2%
Other values (8)4252
14.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter23221
76.9%
Uppercase Letter6957
 
23.1%

Most frequent character per category

ValueCountFrequency (%)
e4914
21.2%
r4431
19.1%
f3365
14.5%
t2388
10.3%
h2388
10.3%
o1276
 
5.5%
x1276
 
5.5%
i948
 
4.1%
c948
 
4.1%
g767
 
3.3%
Other values (3)520
 
2.2%
ValueCountFrequency (%)
O3336
48.0%
P1276
 
18.3%
E1276
 
18.3%
M767
 
11.0%
S302
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Latin30178
100.0%

Most frequent character per script

ValueCountFrequency (%)
e4914
16.3%
r4431
14.7%
f3365
11.2%
O3336
11.1%
t2388
7.9%
h2388
7.9%
P1276
 
4.2%
o1276
 
4.2%
E1276
 
4.2%
x1276
 
4.2%
Other values (8)4252
14.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII30178
100.0%

Most frequent character per block

ValueCountFrequency (%)
e4914
16.3%
r4431
14.7%
f3365
11.2%
O3336
11.1%
t2388
7.9%
h2388
7.9%
P1276
 
4.2%
o1276
 
4.2%
E1276
 
4.2%
x1276
 
4.2%
Other values (8)4252
14.1%

YOJ
Real number (ℝ≥0)

MISSING
ZEROS

Distinct99
Distinct (%)1.8%
Missing515
Missing (%)8.6%
Infinite0
Infinite (%)0.0%
Mean8.922268136
Minimum0
Maximum41
Zeros415
Zeros (%)7.0%
Memory size46.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q13
median7
Q313
95-th percentile24
Maximum41
Range41
Interquartile range (IQR)10

Descriptive statistics

Standard deviation7.573982249
Coefficient of variation (CV)0.8488852984
Kurtosis0.3720724789
Mean8.922268136
Median Absolute Deviation (MAD)5
Skewness0.9884600695
Sum48581.75
Variance57.36520711
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0415
 
7.0%
1363
 
6.1%
2347
 
5.8%
5333
 
5.6%
4324
 
5.4%
6318
 
5.3%
3307
 
5.2%
9286
 
4.8%
10285
 
4.8%
8256
 
4.3%
Other values (89)2211
37.1%
(Missing)515
 
8.6%
ValueCountFrequency (%)
0415
7.0%
0.114
 
0.2%
0.210
 
0.2%
0.251
 
< 0.1%
0.37
 
0.1%
0.49
 
0.2%
0.57
 
0.1%
0.64
 
0.1%
0.74
 
0.1%
0.752
 
< 0.1%
ValueCountFrequency (%)
413
 
0.1%
365
 
0.1%
355
 
0.1%
342
 
< 0.1%
332
 
< 0.1%
321
 
< 0.1%
3112
 
0.2%
3030
0.5%
29.91
 
< 0.1%
2929
0.5%

DEROG
Real number (ℝ≥0)

MISSING
ZEROS

Distinct11
Distinct (%)0.2%
Missing708
Missing (%)11.9%
Infinite0
Infinite (%)0.0%
Mean0.2545696877
Minimum0
Maximum10
Zeros4527
Zeros (%)76.0%
Memory size46.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum10
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8460467771
Coefficient of variation (CV)3.323438798
Kurtosis36.87276339
Mean0.2545696877
Median Absolute Deviation (MAD)0
Skewness5.32087025
Sum1337
Variance0.715795149
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
04527
76.0%
1435
 
7.3%
2160
 
2.7%
358
 
1.0%
423
 
0.4%
615
 
0.3%
515
 
0.3%
78
 
0.1%
86
 
0.1%
93
 
0.1%
(Missing)708
 
11.9%
ValueCountFrequency (%)
04527
76.0%
1435
 
7.3%
2160
 
2.7%
358
 
1.0%
423
 
0.4%
515
 
0.3%
615
 
0.3%
78
 
0.1%
86
 
0.1%
93
 
0.1%
ValueCountFrequency (%)
102
 
< 0.1%
93
 
0.1%
86
 
0.1%
78
 
0.1%
615
 
0.3%
515
 
0.3%
423
 
0.4%
358
 
1.0%
2160
 
2.7%
1435
7.3%

DELINQ
Real number (ℝ≥0)

MISSING
ZEROS

Distinct14
Distinct (%)0.3%
Missing580
Missing (%)9.7%
Infinite0
Infinite (%)0.0%
Mean0.4494423792
Minimum0
Maximum15
Zeros4179
Zeros (%)70.1%
Memory size46.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum15
Range15
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.127265918
Coefficient of variation (CV)2.508143357
Kurtosis23.56544868
Mean0.4494423792
Median Absolute Deviation (MAD)0
Skewness4.023149577
Sum2418
Variance1.270728449
MonotocityNot monotonic
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
04179
70.1%
1654
 
11.0%
2250
 
4.2%
3129
 
2.2%
478
 
1.3%
538
 
0.6%
627
 
0.5%
713
 
0.2%
85
 
0.1%
112
 
< 0.1%
Other values (4)5
 
0.1%
(Missing)580
 
9.7%
ValueCountFrequency (%)
04179
70.1%
1654
 
11.0%
2250
 
4.2%
3129
 
2.2%
478
 
1.3%
538
 
0.6%
627
 
0.5%
713
 
0.2%
85
 
0.1%
102
 
< 0.1%
ValueCountFrequency (%)
151
 
< 0.1%
131
 
< 0.1%
121
 
< 0.1%
112
 
< 0.1%
102
 
< 0.1%
85
 
0.1%
713
 
0.2%
627
 
0.5%
538
0.6%
478
1.3%

CLAGE
Real number (ℝ≥0)

MISSING

Distinct5314
Distinct (%)94.0%
Missing308
Missing (%)5.2%
Infinite0
Infinite (%)0.0%
Mean179.7662752
Minimum0
Maximum1168.233561
Zeros2
Zeros (%)< 0.1%
Memory size46.7 KiB

Quantile statistics

Minimum0
5-th percentile68.9126539
Q1115.1167022
median173.4666667
Q3231.5622781
95-th percentile321.6333333
Maximum1168.233561
Range1168.233561
Interquartile range (IQR)116.4455759

Descriptive statistics

Standard deviation85.81009176
Coefficient of variation (CV)0.4773425476
Kurtosis7.599549329
Mean179.7662752
Median Absolute Deviation (MAD)58.27041147
Skewness1.343412043
Sum1016038.987
Variance7363.371848
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
102.57
 
0.1%
206.96666677
 
0.1%
177.56
 
0.1%
109.56666676
 
0.1%
123.76666676
 
0.1%
95.366666676
 
0.1%
117.66666675
 
0.1%
219.13333335
 
0.1%
232.33333335
 
0.1%
107.53333335
 
0.1%
Other values (5304)5594
93.9%
(Missing)308
 
5.2%
ValueCountFrequency (%)
02
< 0.1%
0.48671145081
< 0.1%
0.50711452951
< 0.1%
2.0333333331
< 0.1%
2.8207855781
< 0.1%
3.044384141
< 0.1%
4.4127700611
< 0.1%
5.2433410441
< 0.1%
6.1333333331
< 0.1%
8.0552650771
< 0.1%
ValueCountFrequency (%)
1168.2335611
< 0.1%
1154.6333331
< 0.1%
649.74710441
< 0.1%
648.32849261
< 0.1%
639.05817231
< 0.1%
638.27536111
< 0.1%
634.46189261
< 0.1%
632.10318571
< 0.1%
630.03333331
< 0.1%
629.09576631
< 0.1%

NINQ
Real number (ℝ≥0)

MISSING
ZEROS

Distinct16
Distinct (%)0.3%
Missing510
Missing (%)8.6%
Infinite0
Infinite (%)0.0%
Mean1.186055046
Minimum0
Maximum17
Zeros2531
Zeros (%)42.5%
Memory size46.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile4
Maximum17
Range17
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.728674971
Coefficient of variation (CV)1.457499782
Kurtosis9.786507278
Mean1.186055046
Median Absolute Deviation (MAD)1
Skewness2.621984172
Sum6464
Variance2.988317156
MonotocityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
02531
42.5%
11339
22.5%
2780
 
13.1%
3392
 
6.6%
4156
 
2.6%
575
 
1.3%
656
 
0.9%
744
 
0.7%
1028
 
0.5%
822
 
0.4%
Other values (6)27
 
0.5%
(Missing)510
 
8.6%
ValueCountFrequency (%)
02531
42.5%
11339
22.5%
2780
 
13.1%
3392
 
6.6%
4156
 
2.6%
575
 
1.3%
656
 
0.9%
744
 
0.7%
822
 
0.4%
911
 
0.2%
ValueCountFrequency (%)
171
 
< 0.1%
141
 
< 0.1%
132
 
< 0.1%
122
 
< 0.1%
1110
 
0.2%
1028
0.5%
911
 
0.2%
822
 
0.4%
744
0.7%
656
0.9%

CLNO
Real number (ℝ≥0)

MISSING
ZEROS

Distinct62
Distinct (%)1.1%
Missing222
Missing (%)3.7%
Infinite0
Infinite (%)0.0%
Mean21.2960962
Minimum0
Maximum71
Zeros62
Zeros (%)1.0%
Memory size46.7 KiB

Quantile statistics

Minimum0
5-th percentile7
Q115
median20
Q326
95-th percentile40
Maximum71
Range71
Interquartile range (IQR)11

Descriptive statistics

Standard deviation10.13893319
Coefficient of variation (CV)0.4760935101
Kurtosis1.157672732
Mean21.2960962
Median Absolute Deviation (MAD)6
Skewness0.7750517583
Sum122197
Variance102.7979663
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16316
 
5.3%
19307
 
5.2%
24264
 
4.4%
23259
 
4.3%
21235
 
3.9%
20231
 
3.9%
18225
 
3.8%
25221
 
3.7%
15217
 
3.6%
22212
 
3.6%
Other values (52)3251
54.5%
(Missing)222
 
3.7%
ValueCountFrequency (%)
062
1.0%
16
 
0.1%
215
 
0.3%
334
 
0.6%
442
 
0.7%
547
 
0.8%
660
1.0%
776
1.3%
892
1.5%
9127
2.1%
ValueCountFrequency (%)
712
 
< 0.1%
653
 
0.1%
645
 
0.1%
631
 
< 0.1%
583
 
0.1%
571
 
< 0.1%
566
0.1%
5514
0.2%
532
 
< 0.1%
525
 
0.1%

DEBTINC
Real number (ℝ≥0)

MISSING

Distinct4693
Distinct (%)100.0%
Missing1267
Missing (%)21.3%
Infinite0
Infinite (%)0.0%
Mean33.77991535
Minimum0.5244992154
Maximum203.3121487
Zeros0
Zeros (%)0.0%
Memory size46.7 KiB

Quantile statistics

Minimum0.5244992154
5-th percentile20.51188558
Q129.14003137
median34.81826182
Q339.00314063
95-th percentile42.76785245
Maximum203.3121487
Range202.7876495
Interquartile range (IQR)9.863109256

Descriptive statistics

Standard deviation8.601746186
Coefficient of variation (CV)0.2546408449
Kurtosis50.50404153
Mean33.77991535
Median Absolute Deviation (MAD)4.814928269
Skewness2.852353416
Sum158529.1427
Variance73.99003745
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42.4352851
 
< 0.1%
40.560130331
 
< 0.1%
12.386741311
 
< 0.1%
38.709991451
 
< 0.1%
33.964777341
 
< 0.1%
39.82721721
 
< 0.1%
28.749890371
 
< 0.1%
40.257221711
 
< 0.1%
36.260331271
 
< 0.1%
34.416507811
 
< 0.1%
Other values (4683)4683
78.6%
(Missing)1267
 
21.3%
ValueCountFrequency (%)
0.52449921541
< 0.1%
0.72029500671
< 0.1%
0.83811752541
< 0.1%
1.0289309681
< 0.1%
1.5659310471
< 0.1%
1.6035079781
< 0.1%
1.855539981
< 0.1%
1.9092251631
< 0.1%
1.9206943671
< 0.1%
2.3651954131
< 0.1%
ValueCountFrequency (%)
203.31214871
< 0.1%
144.18900131
< 0.1%
143.9496051
< 0.1%
133.52827041
< 0.1%
114.05052771
< 0.1%
91.612599981
< 0.1%
84.613888691
< 0.1%
84.379034081
< 0.1%
78.654386051
< 0.1%
76.421478051
< 0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

BADLOANMORTDUEVALUEREASONJOBYOJDEROGDELINQCLAGENINQCLNODEBTINC
01110025860.039025.0HomeImpOther10.50.00.094.3666671.09.0NaN
11130070053.068400.0HomeImpOther7.00.02.0121.8333330.014.0NaN
21150013500.016700.0HomeImpOther4.00.00.0149.4666671.010.0NaN
311500NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
40170097800.0112000.0HomeImpOffice3.00.00.093.3333330.014.0NaN
51170030548.040320.0HomeImpOther9.00.00.0101.4660021.08.037.113614
61180048649.057037.0HomeImpOther5.03.02.077.1000001.017.0NaN
71180028502.043034.0HomeImpOther11.00.00.088.7660300.08.036.884894
81200032700.046740.0HomeImpOther3.00.02.0216.9333331.012.0NaN
912000NaN62250.0HomeImpSales16.00.00.0115.8000000.013.0NaN

Last rows

BADLOANMORTDUEVALUEREASONJOBYOJDEROGDELINQCLAGENINQCLNODEBTINC
595008750055938.086794.0DebtConOther15.00.00.0223.8810400.016.036.753653
595108830054004.094838.0DebtConOther16.00.00.0193.7020510.015.036.262691
595208850050240.094687.0DebtConOther16.00.00.0214.4262060.016.034.751158
595308880053307.094058.0DebtConOther16.00.00.0218.3049780.015.034.242465
595408890048919.093371.0DebtConOther15.00.01.0205.6501590.015.034.818262
595508890057264.090185.0DebtConOther16.00.00.0221.8087180.016.036.112347
595608900054576.092937.0DebtConOther16.00.00.0208.6920700.015.035.859971
595708920054045.092924.0DebtConOther15.00.00.0212.2796970.015.035.556590
595808980050370.091861.0DebtConOther14.00.00.0213.8927090.016.034.340882
595908990048811.088934.0DebtConOther15.00.00.0219.6010020.016.034.571519